Heart failure survival prediction using novel transfer learning based probabilistic features

Heart failure is a complex cardiovascular condition characterized by the heart’s inability to pump blood effectively, leading to a cascade of physiological changes. Predicting survival in heart failure patients is crucial for optimizing patient care and resource allocation. This research aims to develop a robust survival prediction model for heart failure patients using advanced machine learning techniques. We analyzed data from 299 hospitalized heart failure patients, addressing the issue of imbalanced data with the Synthetic Minority Oversampling (SMOTE) method. Additionally, we proposed a novel transfer learning-based feature engineering approach that generates a new probabilistic feature set from patient data using ensemble trees. Nine fine-tuned machine learning models are built and compared to evaluate performance in patient survival prediction. Our novel transfer learning mechanism applied to the random forest model outperformed other models and state-of-the-art studies, achieving a remarkable accuracy of 0.975. All models underwent evaluation using 10-fold cross-validation and tuning through hyperparameter optimization. The findings of this study have the potential to advance the field of cardiovascular medicine by providing more accurate and personalized prognostic assessments for individuals with heart failure.


INTRODUCTION
The primary cause of heart failure (HF) is coronary artery disease (CAD), often precipitated by arterial blockages leading to heart attacks.Heart disease or high blood pressure is also associated with HF (Gjoreski et al., 2017).Various factors contribute to heart diseases, including recognized parameters such as alcohol intake, smoking, diabetes, high cholesterol, and a lack of exercise routine.Previous research has identified high blood sugar, poor diet, excess weight (Benjamin et al., 2019), and unhealthy activities as significant causes of heart disease.Elevated blood pressure thickens artery walls, impeding blood flow and contributing to a higher mortality rate (Sugathan, Soman & Sankaranarayanan, 2008).healthcare strategies.The research takes innovative strides in tailoring treatments for heart failure patients based on their medical histories, aiming to make substantial contributions to saving lives in the realm of heart failure management.This study's key contributions to predict survival in heart failure patients are: • A novel transfer learning-based feature engineering approach is proposed, which generates a new probabilistic feature set from patient data using the ensemble trees method, random forest.
• We have constructed nine fine-tuned machine learning models, including logistic regression, random forest, support vector machine, decision tree, XGBoost classifier, Gaussian naive Bayes, k-nearest neighbors, extra tree classifier, and gradient boosting classifier.
• The performance of the applied models has been validated through a 10-fold crossvalidation process and tuning carried out via hyperparameter optimization.
This research study is further divided into different sections: 'Related Work' examines the literature on heart failure and provides overviews of previously conducted studies.'Proposed Methodology' explains the applied methodology for heart failure, discussing the research design, data sources, data collection methods, and data analysis techniques.'Results and Discussion' elaborates on the research outcomes of employed machine learning models and scientifically discusses our research approach with experimental results.'Conclusion' of the research paper offers a concise summary of our research, summarizing the main key points and presenting the overall significance of our work.

RELATED WORK
This section of our research explores the existing literature and studies in the field, providing a comprehensive overview of the advancements and methodologies applied to predict heart failure survival.Numerous studies have delved into heart failure prediction using traditional machine-learning approaches, as analyzed in Table 1.
Mansur Huang, Ibrahim & Mat Diah (2021) proposed to predict heart failure in patients by utilizing the UCI heart disease dataset.Employing a machine learning technique, the author developed a random forest model with 13 features.The achieved accuracy in heart failure prediction was 88 percent.Similarly, Mamun et al. (2022) focused on predicting survival in heart failure patients.The UCI HF dataset, comprising 299 patient records, was used for this analysis.Employing a machine learning approach, the author tested several models, with LightGBM identified as the optimal classifier.LightGBM demonstrated a superior accuracy score of 85 percent in predicting patient survival, outperforming other classifiers in the study.Newaz, Ahmed & Haq (2021) proposed a technique aimed at preventing heart failure in patients.The dataset employed in this research originated from the HF clinical record dataset collected from the Allied Hospital of Cardiology in Faisalabad, comprising 299 patient records.The study utilized a machine learning approach and introduced an ensemble framework strategy to enhance the robustness of the random forest (RF) model, addressing data imbalance issues.Feature engineering involved the use of Chi-squared and recursive feature analysis.The proposed random forest model demonstrated superior performance compared to other models, achieving an accuracy rate of 76.83 percent for predicting the survival of heart patients.Similarly, Plati et al. (2021) established a systematic process for employing machine learning approaches to diagnose the presence of heart failure.This research is noteworthy for its impact on clinical procedures and its exploration of how different features influence classification correctness scores.Notably, when the entire set of attributes was employed for classification, the results for heart failure diagnosis exhibited excellent accuracy at 91.23 percent, sensitivity at 93.83 percent, and specificity at 89.62 percent.The findings of both studies contribute valuable insights to the field of heart failure prediction, with Newaz, Ahmed & Haq (2021) addressing data imbalance challenges through ensemble strategies and Plati et al. (2021) emphasizing the significance of feature selection in improving diagnostic accuracy.
Hussain et al. ( 2020) developed a method to automatically extract multi-model properties from heart rate variability (HRV) data, capturing its temporal, spectral, and dynamic characteristics.Robust machine learning algorithms, including support vector machine (SVM) with its kernel, decision trees (DT), k-nearest neighbors (KNN), and ensemble classifiers, were employed to evaluate detection performance.Performance metrics such as specificity and sensitivity were utilized to assess the algorithms.The SVM linear kernel exhibited excellent performance, achieving a correctness score of 93.1 percent, a sensitivity of 96 percent, and a specificity of 89 percent.This underscores the effectiveness of the proposed method in accurately analyzing HRV data for comprehensive cardiac assessment.
Javid, Alsaedi & Ghazali (2020), aiming to predict heart disease through the application of machine learning (ML) and deep learning (DL) approaches for enhanced accuracy.The research employed the UCI heart disease dataset to evaluate the effectiveness of ML and DL methods.A voting-based method was utilized to enhance the accuracy of weak classifiers, combining several algorithms.The proposed ensemble technique, employing a voting approach, achieved an accuracy of 85.71 percent with all attributes considered.Notably, this approach demonstrated a notable improvement of 2.1 percent in accuracy.Similarly, the author discussed another relevant study, Kumar & Sikamani (2020), focused on predicting chronic heart disease.The UCI repository served as the dataset for this study, wherein a machine-learning approach was employed.Multiple classifiers, utilizing 13 attributes as indicators of the disease, were explored.The Hoeffding Classifier emerged as the top-performing classifier, achieving an impressive accuracy score of 88.56 percent in predicting chronic heart disease.Ashraf et al. (2021) utilized deep learning technology to predict cardiovascular (CVD) disease.The dataset employed in this study was sourced from the Stanford online repository.Various forecasting approaches were applied to this dataset, including ensemble and learning methods.Among classifiers, J48 achieved a 70 percent accuracy score.The same dataset underwent analysis using a novel approach, incorporating TensorFlow, Keras, and PyTorch techniques.The results of the analysis indicated that J48 outperformed other models, boasting an 80 percent accuracy score in predicting CVD disease.Following a comprehensive analysis, the study concluded that both conventional and cutting-edge technologies present a novel approach for predicting CVD illness.
Pal & Parija (2021) suggested heart disease classification using the Kaggle Cleveland heart disease dataset, which consists of 14 features.Employing a machine learning approach, the study utilized a random forest model, achieving an accuracy of 86.9 percent.The evaluation concluded that the random forest classifier proves to be the most efficient for predicting heart disease.Similarly, Mohan, Thirumalai & Srivastava (2019) proposed an innovative approach to predicting heart disease.The dataset employed in this study was the Cleveland Heart dataset, comprising 13 features.A hybrid machine-learning approach was applied to optimize features and enhance the accuracy of heart disease prediction.The study utilized a novel hybrid machine learning approach to select optimal features, and a random forest linear model was employed for heart disease prediction, achieving an accuracy of 88.7 percent.The findings suggest that the hybrid model stands out as the most effective method for improving the accuracy of predicting heart disease.
Al- Absi et al. (2021) developed ML classifiers to distinguish cardiovascular disease from the control group using data obtained from the Qatar Biobank dataset (QBB).This dataset comprises 150 features extracted from various clinical records of residents in Qatar.Employing a machine learning approach, the study demonstrated that the proposed CatBoost algorithm outperformed other methods, achieving a remarkable accuracy of 93 percent, particularly excelling in the context of cardiovascular disease (CVD) detection.In Ishaq et al. ( 2021), the authors presented an effective approach for predicting heart failure patient survival.The dataset utilized in this research was the HF clinical record dataset downloaded from Kaggle, encompassing records from 299 patients.Employing a data mining approach, the study aimed to select optimal features to enhance the accuracy of predicting patients' survival.To address class imbalance, the Synthetic Minority Oversampling Technique (SMOTE) was employed.Nine ML models were utilized to predict heart failure patient survival, with the Extra Trees Classifier (ETC) outperforming other models.It achieved an accuracy of 92.62 percent, using the highest-ranked features selected by RF to predict the survival of heart patients.

PROPOSED METHODOLOGY
For heart failure survival prediction, this research utilized the HF clinical record dataset, which comprises records from two hundred and ninety-nine patients.The dataset underwent processing to ensure proper formatting.Exploratory data analysis was conducted to understand the data's structure and characteristics relevant to heart failure.The identification of data imbalance in the dataset presented a prediction challenge, which was addressed by applying the SMOTE to balance the dataset.Subsequently, a novel proposed transfer learning feature engineering method was employed on the balanced data, creating a new probabilistic feature set.Following this, the data was partitioned into training and testing phases, allocating 80 percent for training and 20 percent for testing the model on unseen data.Nine state-of-the-art machine learning approaches were employed, and constructed using the training data, and their performance was assessed on the unseen test data.The best-fit parameters were determined for the machine-learning approaches through hyperparameter tuning.The well-performing suggested classifier aims to forecast heart failure survival prediction with improved efficiency.The research methodology workflow diagram is illustrated in Fig. 1.

Dataset details
The HF Clinical Records Database (Ahmad et al., 2017) dataset is also available in the UCI Machine Learning Repository (UCI Machine Learning Repository, 2020).The dataset comprises health records of 299 patients with cardiac issues, and each individual profile includes thirteen clinical variables.There are 194 men, representing 64.88 percent, and 105 women, representing 35.12 percent, in the dataset.All patients are aged 40 or older.A label of 1 denotes a death event, while 0 denotes life.The dataset contains all values with no missing entries.The New York Heart Association (NYHA) classifies HF phases as III and IV.All patients had left ventricular systolic dysfunction and had previously experienced HF.The dataset details are presented in Table 2.

Exploratory data analysis
To gain a deeper insight into the causes of HF, this section examines cardiac statistics and diverse dataset patterns.The proposed HF approach is employed to determine the significance level for the study, which concentrates on 13 variables and is used to train the model-based ML approach.These attributes are evaluated from different perspectives, and Matplotlib and count charts are employed for visualization.Figure 2 charts display the overall quantity of examples in each group within the HF dataset.Figure 2 illustrates the gender (sex) distribution in the dataset, representing 0 for females (105, constituting 35.12 percent of the dataset) and one for males (194, constituting 64.88 percent).
The input attributes for the data are all numeric.Figure 3 presents the correlation evaluation of the heart failure dataset's attributes.According to the analysis, all features exhibit a robust relationship.Some attributes show low negative correlations, such as ejection fraction and serum sodium.Notably, only the feature ''time'' displays a strong negative correlation in the dataset.The analysis highlights a strong relationship among the

Synthetic minority oversampling
SMOTE is a type of oversampling often utilized to address irregularities in data.The SMOTE technique, an example of an oversampling approach, has found widespread application in medical contexts to handle unbalanced class data (Blagus & Lusa, 2015).The SMOTE technique expands the number of samples of raw data by generating minority-class synthetic data randomly from its nearest neighbors.As these new samples are created based on actual information, they possess comparable attributes 53.It is important to note that SMOTE may introduce noise when applied to high-dimensional data, and its usage is discouraged in such cases.The SMOTE method is employed to construct a new training-balanced dataset.As illustrated in Fig. 4 and Fig. 5, SMOTE augments the quantity of samples for both imbalanced and balanced classes.

Data splitting
We divided the data into training and testing phases.To apply the machine learning classifiers and generate forecast results on unseen data, we allocated 20 percent for testing

Applied machine learning classifiers
This section explores various machine learning approaches employed in predicting heart failure.It provides an explanation of how machine learning models function and introduces key terminology (Zaidi, Tariq & Belhaouari, 2021).Our proposed study evaluates nine advanced machine learning models for forecasting heart failure.Using supervised machine learning methods, the outcome of heart failure data is predicted.

Logistic regression
Logistic regression (LoR) is a supervised statistical learning technique for classification and regression (Daghistani & Alshammari, 2020).LoR utilizes independent variables to forecast the categorical dependent variable.The binary classification probability measurements form the foundation for learning and prediction processes.In logistic regression models, class variables must be binary.Similar to the ''Target'' column in the dataset, this column consists of two binary numbers: 0 indicates patients unlikely to develop HF, and 1 denotes patients likely to develop HF.

Random forest
The RF is a supervised machine-learning algorithm consisting of several decision trees (Palimkar, Shaw & Ghosh, 2022).The decision nodes of the tree represent the features, while the leaf nodes indicate the intended outcome.The final forecast is determined by a majority vote after using observations randomly selected by RF to construct decision  Shaukat, 2013).The decision boundary is known as a hyperplane.SVM chooses extreme vectors, or support vectors, to create the hyperplane.Because of this, the technique is often referred to as a support vector machine.

K-nearest neighbor
KNN is a supervised learning method that predicts the data class by considering information from its closest neighbors.KNN attempts to group data points that are close to each other based on similarity (Tajik et al., 2016).This approach is non-parametric, categorizing data points according to their proximity.The training procedure is time-consuming due to the slow learning process.The similarity between data points is evaluated using metrics such as Euclidean distance or equivalent distance measures (Jones & Hardiyanti, 2021).

Decision tree
An algorithm used for machine learning in classification problems is referred to as a DT (Charbuty & Abdulazeez, 2021).The tree-like structure of a DT comprises nodes and leaves, with data attributes allocated to inner nodes, and outcome labels stored in leaf nodes.In DT, the topmost node is the root node.Decision tree algorithms autonomously generate trees from input data, aiming to minimize generalization errors through techniques such as decision tree classification (DTC).The primary objective of DTC is to identify the optimal decision tree.A noteworthy challenge in decision tree construction is the selection of appropriate data properties.

Extreme gradient boosting
A machine learning approach for supervised ensemble classification analysis is called Extreme Gradient Boosting (XG Boost) (Fitriyani et al., 2020).Ensemble learning algorithms combine various machine learning techniques to enhance performance.XGB is known for its adaptability, adequacy, and portability.It employs the parallel gradient boosting tree method to address classification issues.To mitigate overfitting, XGB incorporates a superior regularization technique.

Extra tree classifier
The ETC represents an advancement in the bagged decision tree-based ensemble learning approach (Ossai & Wickramasinghe, 2022).While the ETC and random forest share some underlying concepts, their distinction lies in the way the structure is generated.In the context of a classification task, the ETC amalgamates the outputs of diverse, unrelated decision trees to predict the target class.The ETC technique leverages the training data to generate multiple bagged decision tree samples, with the decision rule being chosen randomly.Predictions are then made using a majority voting approach based on the decision trees.The outcomes of the majority voting process are aggregated to produce the final forecast.

Gradient boosting classifier
The gradient boosting (GB) technique is the most widely employed progressive-learning ensemble technique (Rufo et al., 2021).Predictive analytics proves effective when utilizing both regression and classification.The GB approach progresses incrementally (Bowd et al., 2020).By amalgamating the outcomes of numerous weak models, we can construct a final predictive model that accurately forecasts.The GB technique aims to amalgamate multiple weak models into a robust one.GB constructs a model sequentially, training each primary classifier individually.The goal is to establish a reliable model.A weak model can transform into a valuable asset through the integration of numerous models.

Gausian Naïve Bayes
The Gaussian naive Bayes (GNB) supervised machine learning algorithm was designed (Barus et al., 2020).The GNB model is based on the naive Bayes theorem and associated methodologies.The GNB approach (Cataldi, Tiberi & Costa, 2021) assumes that all predictors are independent, which is a strong premise.It posits that one feature of a class can exist separately from another part of the class.The GNB utilizes a Gaussian distribution and naive assumptions to forecast the target class.

Novel proposed transfer learning
A novel feature engineering technique is proposed in our research, as shown in Fig. 6.Our projected approach extracts class probability features (Raza et al., 2023)   The random forest model consists of a collection of decision trees, each constructed using a random subset of the training data.When given a novel input sample, X new , the random forest predicts the probabilities of its class by aggregating the predictions from each decision tree.T 1 ,T 2 ,...,T i represent the distinct decision trees within the random forest, with i denoting the total number of trees.For a specific tree T i , the computation of class probabilities for X new is as follows: The aggregated predicted class probabilities of the random forest model result from averaging the predictions made by each tree.
In this context, N class,i (X new ) indicates the number of data points in the leaf node of T i associated with a particular class, and N samples,i represents the total sample count in that leaf node.The function predict_proba() in random forest libraries computes and provides P ensemble (X new ), presenting the class probabilities for the input sample X new .

Hyperparameter tuning
Appropriate training and testing processes (Isabona, Imoize & Kim, 2022) are employed to determine the optimal hyperparameter values for applied machine learning models.After finalizing the parameters, the machine learning algorithms accurately predicted the results, enhancing their accuracy score.Table 3 provides a comprehensive list of the hyperparameters investigated in our research (Elgeldawi et al., 2021).The analysis findings, which also reveal the parameters utilized to generate the excellent matrix score, demonstrate that hyperparameter tuning significantly improved the accuracy of our study's machine-learning models.

RESULTS AND DISCUSSION
This section examines the exploratory approach and the outcomes of the studies to ascertain the likelihood of survival for heart patients.The results incorporating complete attributes are presented, with a binary classification task utilizing the Death Event attribute to discern whether a patient survived or passed away before the 130-day follow-up period.The SMOTE tool is employed to balance the dataset, and hyperparameter tuning is utilized to improve the forecasting scores of machine learning classifiers.The balanced dataset is then employed to train machine learning algorithms, with evaluations conducted for accuracy, precision, recall, and F1-Score.

Experiment design
The performance of the algorithms has been examined using supervised ML models.The Python programming language and the Scikit-Learn library module are utilized to create the ML classifiers.The data is divided between the training and testing phases in an 80:20 ratio.Various performance evaluation measures are employed to assess the significance of the ML algorithms.The experiments were conducted entirely in Python, utilizing various library modules from Scikit-Learn.F1 scores, recall, accuracy, and precision were measured using a system with 8 GB of RAM and an Intel(R) Core(TM) m3-7Y30 processor running at 1.00 and 1.61 GHz.
• ''TP'' case, in which the value reflects a positive trend for both actual and forecasted values.
• The ''TN'' instance is where the real value is yes, and the forecast value is no.
• When the projected value is yes, and the actual value is no, the situation is directed to an ''FP'' case.
• Forecast value being no and real value being yes in this situation is known as the ''FN'' case.

Accuracy
The accuracy score of the model that significantly outperforms others highlights its proficiency in clinical forecasting.In a clinical context, the algorithm's defect rate is directly tied to its accuracy, with improved correctness as the fault rate decreases.The degree of accuracy is determined by dividing the number of precise predictions by the total predictions, underscoring its relevance and reliability in clinical applications.The formula for calculating the accuracy score is as follows:

Precision
The precision in this context refers to the accuracy of the methods employed in determining the sample size.This high precision underscores the reliability and accuracy of our approach, particularly in clinical contexts where precise sample size calculations are crucial for ensuring the validity and statistical power of studies.The formula to determine the accuracy precision is as follows:

Recall
In the context of clinical applications, recall plays a crucial role as it represents the percentage of accurately identified positive cases relative to the total number of characterized instances.A higher recall score is particularly significant in this setting, as it indicates fewer instances of false negatives.This means that the model correctly identifies a greater percentage of positive outcomes, which is crucial for ensuring that potential clinical conditions are not overlooked.The formula for calculating the recall score is as follows:

F1 score
The F1 score is instrumental in quantifying the balance between recall and precision.In the assessment of binary classification models, particularly in clinical contexts, the F1 score serves as a vital statistical indicator.It mandates a cohesive relationship between precision and recall, culminating in the supremacy of the F1 score.In clinical applications, the F1 score proves invaluable for its ability to succinctly capture the trade-off between correctly identified instances and accurately predicted positive cases.The formula to determine the F1 score is listed below:

Study results discussion without using proposed technique
Table 4 compares and contrasts the algorithms without using our suggested strategy.All learning algorithms achieved acceptable accuracy and involved timed computations.The GB classifier attained a correctness score of 95 percent, with a precision of 95 percent, a recall of 95 percent, and an F-1 score of 95 percent.Based on measurement metrics and investigation, the KNN model yielded the lowest accuracy at 58 percent, with precision scores of 56 percent, recall scores of 58 percent, and F-1 scores of 57 percent.Regarding computation time, models KNN and DT achieved the lowest training computation times of 0.003 and 0.005, respectively.However, the performance metrics of the tested algorithms indicate that most models do not predict heart failure effectively, as they need to achieve better balance.The classification report analysis of each method is detailed in Table 5. Figure 7 evaluates the accuracy of all applied machine learning (ML) algorithms.This bar chart-based graph displays the accuracy results of all applied algorithms without using the proposed approach.Figure 8 depicts the K-fold evaluation of model overfitting.This bar chart-based graph shows the evaluation of accuracy scores with k-fold, without using SMOTE.

Results validation using K-fold cross validation without SMOTE
The performance assessment of all algorithms, focusing on addressing overfitting issues, is presented in Table 6.The 10 K-fold cross-validation technique was employed to validate the robustness of our models.The examination results demonstrated a 95 percent accuracy score using the K-fold approach without employing the SMOTE technique.Figure 8 illustrates that certain algorithms fail to achieve balanced accuracy with the K-fold crossvalidation method.Visual inspection indicates that SVM, KNN, ETC classifiers yielded low accuracy scores.The K-fold investigation revealed that all algorithms exhibited signs of overfitting and necessitated rebalancing.

Study results discussion using SMOTE technique
The performance indicators for the algorithms used in our proposed study are displayed in Table 7. Performance indicator results and computation time analyses were computed using our proposed method.The outcomes demonstrate that all ML algorithms employed to forecast heart failure received the highest performance matrix scores.The outcomes of all the used classifiers are shown in Fig. 9, along with the outcome from our top model, the RF classifier, which attained an excellent correctness score of 96.34 percent for all  good accuracy score.The models with the lowest accuracy scores were k-nearest neighbors (KNN) (56 percent) and support vector machine (SVM) (57 percent).We analyzed it and found that the time computation analysis showed the training time for all the models we used in our study.For heart failure prediction, the model that performs the best, RF, gives the highest accuracy score of 96.34 percent in 0.452 s (sec).The LoR train time was 0.043 s, SVM was 0.040 s, DT was 0.005 s, XGB was 0.040 s, ETC was 0.340 s, and GB was 0.202 s.
The models with the lowest train times were KNN (0.003 s) and GNB (0.004 s).

Comparative analysis using K-fold cross validation
Table 8 presents the performance evaluation of all algorithms to address overfitting issues through 10-fold cross-validations.To verify the presence of overfitting in our models, we utilized the 10-fold cross-validation method as illustrated in Table 8.The results of the investigation indicate that the 10-fold approach yielded a 91 percent accuracy score, aligning with our project's methodology.
Figure 10 displays that the relative algorithms exhibit excellent accuracy scores when the K-fold cross-validation method is applied.According to visual analysis, SVM and KNN achieved low accuracy.The k-fold approach is employed to validate all applied algorithms, and the investigation reveals that all classifiers are balanced, yielding excellent results in test data.

Classification report results of employed models using SMOTE
For an overview of the target class categorization reports for each model, refer to Table 9.The categorization scores for the models were obtained using the suggested methodology.

Comparison analysis of current study with and without using SMOTE
Figure 11 compares the study results using SMOTE and without SMOTE.In this analysis, the performance of LoR, SVM, and KNN models decreased with the application of SMOTE.However, SMOTE demonstrated effective performance with tree-based algorithms for forecasting HF survival in patients.Figure 11 illustrates the comparative analysis conducted in our study with and without the SMOTE technique.

Performance analysis using proposed transfer learning features
For an overview of the target class categorization reports for each model, refer to Table 10.The categorization scores for the models were obtained using the suggested methodology.
The parameters employed to evaluate our proposed research study include accuracy, precision, recall, and F-1 score.The performance results demonstrate that our proposed study achieved an outstanding correctness score of 97.5 in evaluating classification results.
The investigation further substantiates that our proposed transfer learning approach is satisfactory for predicting heart failure survival.

K-fold cross-validation analysis of proposed transfer learning
The 10-fold cross-validation technique is employed to evaluate the performance of our proposed RF transfer learning model.Table 11 presents the results of the applied 10-fold cross-validation.The findings reveal that the proposed RF model achieved a cross-validation accuracy of 98.7%.Our proposed RF model with transfer learning yields the best outcomes, accompanied by a standard deviation of 0.0164.The analysis indicates that our proposed approach with RF offers a robust analysis result for predicting heart failure survival.

Comparison with state of the art studies
We conducted a comparative analysis by juxtaposing our dataset with findings from previous studies, as outlined in Table 13.We have included the studies published in the last two years for comparison.The evaluation criteria encompassed the year of study, approach type, predicted approach, accuracy, precision, recall, and F1 score.Our investigation demonstrated that the random forest model outperformed the earlier study.Notably, our novel approach, the RF+Transfer Learning technique, yielded the most precise results.

Confusion matrix analysis of our proposed approach
A confusion matrix investigation demonstrates that the results of our performance matrix are accurate, as depicted in Fig. 12.Our proposed approach classifier, RF, which performed effectively, utilizes this matrix.According to the analysis, 42 true negatives and 38 true positives were identified.Only one false negative and one false positive result were observed

Discussions and limitations
This research proposed a novel transfer learning-based feature engineering approach, generating a new probabilistic feature set from patient data using the ensemble trees method, Random Forest.The proposed approach achieved a performance accuracy of 97%.However, there is still a 3% error rate.We aim to further improve performance scores by optimizing the architecture of the proposed approach and implementing advanced mechanisms.

CONCLUSION
Processing raw medical data about the hearts of heart patients using machine learning classifiers has the potential to save lives.Identifying factors that increase the likelihood of heart failure enables the implementation of preventive measures to reduce death rates.This research focuses on forecasting HF survival through ML and utilizes data from 299 clinical record patients.The study employs a novel RF+ transfer learning approach, incorporating nine ML algorithms, namely LoR, RF, SVM, DT, XGB, GNB, KNN, ETC, and GB.To address the class imbalance, the SMOTE tool is applied.Using SMOTE improves the accuracy of tree-like algorithms in predicting the survival of HF patients.The performance metrics for the novel RF+ transfer learning with SMOTE accuracy are measured at 0.97 accuracy, 0.98 precision, 0.98 recall, and 0.98 F1 score.The performance of all employed ML algorithms is analyzed based on the complete set of attributes in the HF dataset.The results indicate that tree-like structure approaches are highly effective in achieving maximum accuracy.The proposed RF outperforms using the novel transfer learning approach with 97 percent accuracy and a computation time of 0.413.A refined study confirms the higher accuracy of our proposed model.Overfitting of the models is investigated using 10-fold cross-validation.

Future work
Our research study has the potential to propel the medical field forward by aiding doctors in predicting the likelihood of survival for patients with heart failure.Additionally, it will assist healthcare professionals in identifying critical risk factors for those heart failure patients who survive.To enhance the robustness of our research, we plan to address the current limitations by incorporating additional patient data into the dataset.This expansion will involve managing more parameters and employing advanced techniques such as deep learning and feature engineering to improve the accuracy of predicting heart failure outcomes.

Figure 2
Figure 2 The overall quantity analysis.

Figure 4
Figure 4 The data analysis before balancing by SMOTE.

Figure 5
Figure 5 The data analysis after balancing by SMOTE.

Figure 7 Figure 8
Figure 7 Shows the accuracy result of all applied algorithms without using the proposed approach.Full-size DOI: 10.7717/peerjcs.1894/fig-7

Figure 10
Figure 10 The accuracy score analysis and K-fold method were used to affirm the applied model overfitting.Full-size DOI: 10.7717/peerjcs.1894/fig-10

1894 12/30 Heart Failure Dataset RF SMOTE Based Data Balance AI Based Patient Survival Prediction New Transfer Based Probality Features
by inputting the heart failure clinical record dataset.Our research experiments revealed that the suggested feature engineering performs best in heart failure survival prediction scores.The random forest model is trained using a training set, and the random forest technique, employing the function predict_probability(), predicts the class probability.Here, X denotes the input data attributes, and y represents the target labels.

Figure 9 Showed the accuracy on test data using proposed SMOTE technique. Full-size DOI: 10.7717/peerjcs.1894/fig-9 Table 8 Analysis of overfitting on applied models using K-fold with our proposed study.
The investigation's findings demonstrate that GNB and SVM have low accuracy scores in tests of the class metric.Our projected RF model provided an excellent correctness score of 96, while the XGBoost and Gradient Boost classifiers achieved a 95 accuracy score in evaluating classification results.The suggested model for classification outcome evaluation, the Random Forest classifier, attained a 96.34 accuracy score based on the analysis.